Preparation of MaDiTS corpus for Malay dialect translation and speech synthesis system
نویسندگان
چکیده
This paper presents our work in acquiring a Malay dialect translation and speech synthesis corpus. In this study, an architecture of speech corpus acquisition, which including Malay dialect translation and Malay dialect grapheme to phoneme (G2P), was proposed. The pronunciation dictionary for dialectal Malay was generated through G2P tool. As dialectal Malay is considered as scarce resource, dialectal translation rules were developed for translating standard Malay text into dialectal Malay. With this, Kelantanese Malay is chosen in this research as it is considered as one of the Malay dialect from Kelantan, which positioned in the northeast of Peninsular Malaysia. This dialect is very distinctive. Evaluation results showed that the selected sentences through proposed approach has a correlation coefficient of about 0.99, which mean that it is phonetically well balanced.
منابع مشابه
Parallel Speech Corpora of Japanese Dialects
Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel spee...
متن کاملCorpus Design for Malay Corpus-based Speech Synthesis System
Problem statement: Speech corpus is one of the major components in corpus-based synthesis. The quality and coverage in speech corpus will affect the quality of synthesis speech sound. Approach: This study proposes a corpus design for Malay corpus-based speech synthesis system. This includes the study of design criteria in corpus-based speech synthesis, Malay corpus based database design and the...
متن کاملGlobalization, Standardization, and Dialect Leveling in Iran
This paper is an attempt to shed light on the effects of modernization, urbanization, monolingual educational system, and mass media as well as the process of globalization on dialect leveling among Persian dialects. In so doing, the first part of the paper elaborates on the relationship between globalization and sociolinguistics, and on the concept of standardization. Also, it discusses some ...
متن کاملAn Overview of BPPT's Indonesian Language Resources
This paper describes various Indonesian language resources that Agency for the Assessment and Application of Technology (BPPT) has developed and collected since mid 80’s when we joined MMTS (Multilingual Machine Translation System), an international project coordinated by CICCJapan to develop a machine translation system for five Asian languages (Bahasa Indonesia, Malay, Thai, Japanese, and Chi...
متن کاملGrapheme to phoneme conversion: an Arabic dialect case
We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...
متن کامل